# Importing a geographic health disparities data set from github repo
urlfile = 'https://raw.githubusercontent.com/eitanaka/DATS6101_Project1_Team2/main/dataset/geographic_health_disparities.csv'
geo_health_df <- read_csv(url(urlfile))

Abstruct

This paper explores the relationships between four health conditions (depression, poor mental health, lack of sleep, and lack of physical activity) using data from the Centers for Disease Control and Prevention (CDC) PLACES project. The paper begins by examining national-level correlations between the four health conditions and identifying the strongest correlations. State-level correlations are then explored using a US map that highlights the correlations between mental health and both sleep and physical activity across most US states.

Next, multiple linear regression analyses are conducted to isolate the effect of each lifestyle factor (sleep and physical activity) on each health outcome (depression and poor mental health) while controlling for the effects of the other independent variable. The results suggest that lifestyle factors influence poor mental health more than depression and that, of these lifestyle factors, sleep has a greater influence than physical activity.

1. Introduction:

Add paragraphs here

a. Dataset

Add paragraphs here Basic information about a dataset: We will look at the dataset county-level geographic health disparities in the US 2020. This dataset offers model-based census tract-level estimates of the prevalence of 29 health outcomes, preventive services usages, chronic disease-related health risk behaviors, and health statuses as part of the 2020 U.S Census.It covers the entire United States- 50 states and the District of Columbia (DC) at the county, place, census tract, and ZIP Code Tabulation Area levels. At four geographic levels, it uniformly offers information on this huge scale for local locations. The Epidemiology and Surveillance Branch of the Centers for Disease Control and Prevention (CDC), Division of Population Health, provided the estimates. These estimates can be used to identify emerging health problems and to help develop and carry out effective, targeted public health prevention activities. Because the small area model cannot detect effects due to local interventions, users are cautioned against using these estimates for program or policy evaluations. Data sources used to generate these model-based estimates include Behavioral Risk Factor Surveillance System (BRFSS) 2020 or 2019 data, Census Bureau 2010 population data, and American Community Survey 2015–2019 estimates.

b. SMART Question

Add paragraphs here Initial Question1: Is there a significant relationship between the prevalence of depression, poor mental health, lack of sleep, lack of physical activity and geographic location in the US? Initial Question2: Are depression and other variables correlated?/ Is there any correlation between depression and other three variables? Research Question: How lack of sleep and less physical activities is impacting human mental health and how it leads to depression issues we are assuming based on modern lifestyle./To what extent are lack of sleep and lack of physical activity correlated with poor mental health and depression among tracts and states in the US?

c. Data cleaning and preparation

After some analysis and research, we decided to work on some specific data, for which we had to clean our dataset. We created a subset keeping only some variables by reshape dataset from long to wide: There are 10 variables in our new subset: Year, State Abbreviation, County Name, County FIPS, Location Name, Total Population (per tract). Furthermore, filtering only rows with health measure of interest.That is Depression, Mental Health, Lack of Sleep, Lack of Leisure Time Physical Activity (estimated % of tract with condition).

d. Independent and dependent variable

2. EDA

This Exploratory Data Analysis (EDA) section aims to examine and understand the relationships between various health-related variables, such as depression, mental health, sleep, and lack of physical activity. We begin by performing basic EDA,( which involves examining the data types, variable names, and the number of observations. We then check for missing values, outliers, and data distribution, addressing any issues that may impact the analysis. Next, we conduct descriptive statistics to understand the data’s central tendency, variability, and spread. A series of visualizations and summary statistics for each variable at the national and state levels follow this. We also investigate correlations between variables to identify potential predictors for modeling. Overall, this comprehensive EDA process helps us better understand the dataset, uncover patterns, and detect potential anomalies, laying the groundwork for building accurate and reliable predictive models.

a. Basic EDA

In the Basic EDA section, we scrutinize the dataset’s structure and composition by focusing on data types, variable names, and the number of observations. Subsequently, we address missing values and identify outliers to ensure our data is reliable and robust for further analysis. This crucial step establishes a solid foundation for a deeper exploration of the relationships between health-related variables in the subsequent phases of the EDA process.

Data Structure and composition

# Look at the datatypes
str(geo_health_df)
## spc_tbl_ [72,337 × 12] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ Year                 : num [1:72337] 2020 2020 2020 2020 2020 2020 2020 2020 2020 2020 ...
##  $ StateAbbr            : chr [1:72337] "AL" "AL" "AL" "AL" ...
##  $ StateDesc            : chr [1:72337] "Alabama" "Alabama" "Alabama" "Alabama" ...
##  $ CountyName           : chr [1:72337] "Baldwin" "Barbour" "Chambers" "Chilton" ...
##  $ CountyFIPS           : num [1:72337] 1003 1005 1017 1021 1031 ...
##  $ LocationName         : num [1:72337] 1.00e+09 1.01e+09 1.02e+09 1.02e+09 1.03e+09 ...
##  $ TotalPopulation      : num [1:72337] 4302 4264 3619 3808 2117 ...
##  $ Data_Value.DEPRESSION: num [1:72337] 27.6 23.1 25.9 28.2 26.7 26.4 28 27.8 28.3 21.9 ...
##  $ Data_Value.LPA       : num [1:72337] 29.5 37.9 35.6 32.3 33.9 24.2 28.9 30.7 32.3 16.4 ...
##  $ Data_Value.SLEEP     : num [1:72337] 36.2 46.4 41.4 39.9 42 36.1 36.9 40.4 41.1 30.4 ...
##  $ Data_Value.PHLTH     : num [1:72337] 13.1 15.7 15.9 14 13.7 10.4 12.4 13.9 14.8 6.7 ...
##  $ Data_Value.MHLTH     : num [1:72337] 17.6 18.3 17.3 17.5 17.3 15.2 17.1 17.6 18.2 13.2 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   Year = col_double(),
##   ..   StateAbbr = col_character(),
##   ..   StateDesc = col_character(),
##   ..   CountyName = col_character(),
##   ..   CountyFIPS = col_double(),
##   ..   LocationName = col_double(),
##   ..   TotalPopulation = col_double(),
##   ..   Data_Value.DEPRESSION = col_double(),
##   ..   Data_Value.LPA = col_double(),
##   ..   Data_Value.SLEEP = col_double(),
##   ..   Data_Value.PHLTH = col_double(),
##   ..   Data_Value.MHLTH = col_double()
##   .. )
##  - attr(*, "problems")=<externalptr>

# A number of observation
nrow(geo_health_df)
## [1] 72337

Our dataset consists of 72,337 observations and 12 variables, including both numeric and character data types. These variables provide information on the year, state abbreviation, state description, county name, county FIPS code, location name, and total population. Moreover, the dataset includes values for depression, leisure-time physical inactivity (LPA), sleep, poor general health (PHLTH), and poor mental health (MHLTH). This data represents various health-related indicators across different locations within the United States, enabling further analysis to identify trends and relationships among these variables. (Add image to easy to interpret what variables in it?)

Check for missing values

# check for the missing value
sum(is.na(geo_health_df))
## [1] 0

Upon examining the dataset, we found no missing values across all variables. This completeness is a significant advantage, as it ensures the reliability and robustness of our analysis without requiring imputation or other techniques to address missing data. Consequently, we can confidently proceed with our exploration of the relationships between health-related variables.

Check for outliers and data distribution

We assess the presence of outliers in the health-related variables by analyzing their distributions. Both the original and the data without outliers display normal distributions for Depression, MHLTH, and Sleep. In contrast, LPA shows a right-skewed distribution. Comparing the dataset reveals minimal differences between those with and without outliers. Therefore, we choose to analyze the dataset containing outliers, expecting a minimal impact on our findings while enabling a comprehensive understanding of the relationships among the health-related variables.

# check for outliers
outlier_Depression  <- outlierKD2(geo_health_df, geo_health_df$Data_Value.DEPRESSION)
outlier_MLHTH  <- outlierKD2(geo_health_df, geo_health_df$Data_Value.MHLTH)

outlier_SLEEP  <- outlierKD2(geo_health_df, geo_health_df$Data_Value.SLEEP)

outlier_LPA  <- outlierKD2(geo_health_df, geo_health_df$Data_Value.LPA)

b. Descriptive Statistics

In the descriptive statistics section, our objective is to understand the central tendency, variability, and spread of various health-related indicators. We analyze the dataset at national and state levels, focusing on critical aspects of health and well-being. By calculating summary statistics, standard deviations, and mean values for each state, we create visual representations using maps and boxplots. This comprehensive analysis allows us to identify trends, patterns, and regional disparities, paving the way for a deeper exploration of the relationships and underlying factors influencing these health-related variables.

# Create a list containing the data frames by each state
data_by_state <- split(geo_health_df, geo_health_df$StateAbbr)

National Level Summary Statistics

summary_nation_DEPRESSION <- summary(geo_health_df$Data_Value.DEPRESSION)
summary_nation_MHLTH <- summary(geo_health_df$Data_Value.MHLTH)
summary_nation_SLEEP <- summary(geo_health_df$Data_Value.SLEEP)
summary_nation_LPA <- summary(geo_health_df$Data_Value.LPA)

summary_nation_DEPRESSION_df <- tidy(summary_nation_DEPRESSION)
summary_nation_MHLTH_df <- tidy(summary_nation_MHLTH)
summary_nation_SLEEP_df <- tidy(summary_nation_SLEEP)
summary_nation_LPA_df <- tidy(summary_nation_LPA)

summary_df <- rbind(summary_nation_DEPRESSION_df, summary_nation_MHLTH_df, summary_nation_SLEEP_df, summary_nation_LPA_df)
summary_df$Variable <- c("DEPRESSION", "MHLTH", "SLEEP", "LPA")
summary_df <- summary_df[,c("Variable", "minimum", "q1", "median", "mean", "q3", "maximum")]

kable(summary_df, align = "c", 
      col.names = c("Var", "Min", "1Q", "Median", "Mean", "3Q", "Max"), 
      caption = "National Level Summary Statistics of Health Variables")
National Level Summary Statistics of Health Variables
Var Min 1Q Median Mean 3Q Max
DEPRESSION 8.5 17.9 20.4 20.5 22.9 37.8
MHLTH 6.1 13.2 15.0 15.1 16.9 33.0
SLEEP 19.8 30.7 33.5 34.0 36.6 54.4
LPA 7.8 18.7 23.6 24.5 29.3 63.7

At the national level, the summary statistics reveal the following patterns for the health-related variables:

  • Depression: The percentage of the population affected by depression ranges from 8.5% to 37.8%, with a median of 20.4% and a mean of 20.5%.

  • Mental Health (MHLTH): The percentage of people experiencing poor mental health for 14 or more days ranges from 6.1% to 33.0%, with a median of 15.0% and a mean of 15.1%.

  • Sleep: The percentage of the population with sleep disturbances ranges from 19.8% to 54.4%, with a median of 33.5% and a mean of 34.0%.

  • Leisure-time Physical Inactivity (LPA): The percentage of the population without leisure time ranges from 7.8% to 63.7%, with a median of 23.6% and a mean of 24.5%.

These results highlight the necessity of examining the complex relationships between these variables to better understand their correlation and develop effective strategies for improving public health.

The map-based analysis

The map-based analysis presents an overview of the mean values for four health conditions across U.S. counties. These color-coded maps utilize darker shades to indicate higher mean values for each health condition. The first map displays mean depression values, revealing a higher prevalence in the eastern region, particularly around West Virginia and the western part of Washington. The second map shows the average rates of poor mental health, which are more common in the eastern states than in the West. The third map illustrates average sleep deprivation levels, demonstrating a higher prevalence in the eastern part of the country compared to the western region. Lastly, the fourth map highlights the percentages of individuals engaging in less physical activity, with the Southeast displaying exceptionally high rates. These findings suggest that, on average, the East experiences worse health outcomes than the West concerning the variables of interest. This information is crucial for understanding regional health disparities and informing targeted public health interventions.

# Summary statistic about percentage of population affected with depression in each state
summary_by_state_DEPRESSION <- lapply(data_by_state, function(x) summary(x$Data_Value.DEPRESSION))
# head(summary_by_state_DEPRESSION,3)
# tail(summary_by_state_DEPRESSION,3)

# A list of standard deviation for percentage of population affected with Depression in each state
sd_by_state_DEPRESSION<- lapply(data_by_state, function(x) sd(x$Data_Value.DEPRESSION))
# head(sd_by_state_DEPRESSION,3)
# tail(sd_by_state_DEPRESSION,3)

# A data frame containing mean values of the incidence rate of Depression for each country in each state
mean_by_state_DEPRESSION <- lapply(data_by_state, function(x) mean(x$Data_Value.DEPRESSION))
mean_df_DEPRESSION <-data.frame(State = names(mean_by_state_DEPRESSION), Mean_Depression=unlist(mean_by_state_DEPRESSION))
mean_df_DEPRESSION["fips"] <- fips(mean_df_DEPRESSION$State)

# A map of the U.S. showing average percent of people suffering from DEPRESSION for each country in each state
plot_usmap(data=mean_df_DEPRESSION, values="Mean_Depression", labels = TRUE) +
  scale_fill_continuous(low = "white", high = "red", guide = FALSE) +
  scale_x_continuous(expand = c(0, 0)) + scale_y_continuous(expand = c(0, 0)) +
  ggtitle("Mean Depression by State") +
  guides(fill = guide_colorbar(title = "Mean %", 
                                title.position = "top", 
                                title.hjust = 0.5, 
                                label.position = "left",
                                label.hjust = 0.5))

# Summary of the percentage of the population in each country in each state with a health status of 14 or more days with poor mental health
summary_by_state_MHLTH <- lapply(data_by_state, function(x) summary(x$Data_Value.MHLTH))
# head(summary_by_state_MHLTH,3)
# tail(summary_by_state_MHLTH,3)

# The list containing standard deviation per state: The percentage of the population in each country with a health status of 14 or more day with poor mental health
sd_by_state_MHLTH <- lapply(data_by_state, function(x) sd(x$Data_Value.MHLTH))
# head(sd_by_state_MHLTH,3)
# tail(sd_by_state_MHLTH,3)

# The data frame containing mean value per state: Percentage of the population in each country with poor mental health for 14 or more days as a health status.
mean_by_state_MHLTH <- lapply(data_by_state, function(x) mean(x$Data_Value.MHLTH))
mean_df_MHLTH <- data.frame(State = names(mean_by_state_MHLTH), Mean_MHLTH = unlist(mean_by_state_MHLTH))
mean_df_MHLTH["fips"] <- fips(mean_df_MHLTH$State)

# Map of the United States plotting the mean percent incidence of the population with poor mental health by state.
plot_usmap(data=mean_df_MHLTH, values="Mean_MHLTH", labels = TRUE) +
  scale_fill_continuous(low = "white", high = "green", guide = FALSE) +
  scale_x_continuous(expand = c(0, 0)) + scale_y_continuous(expand = c(0, 0)) +
  ggtitle("Mean Mental Health by State") +
  guides(fill = guide_colorbar(title = " Mean %", 
                                title.position = "top", 
                                title.hjust = 0.5, 
                                label.position = "left",
                                label.hjust = 0.5))

# A list of summary statistic per state: Each data is a percentage of sleep disturbance in each country
summary_by_state_SLEEP <- lapply(data_by_state, function(x) summary(x$Data_Value.SLEEP))
# head(summary_by_state_SLEEP,3)
# tail(summary_by_state_SLEEP,3)

# A list of standard diviation per state: Each data is a percentage of sleep disturbance in each country
sd_by_state_SLEEP <- lapply(data_by_state, function(x) sd(x$Data_Value.SLEEP))
# head(sd_by_state_SLEEP,3)
# tail(sd_by_state_SLEEP,3)

# A data frame about mean value per state: Each data is a percentage of sleep disturbance in each country
mean_by_state_SLEEP <- lapply(data_by_state, function(x) mean(x$Data_Value.SLEEP))
mean_df_SLEEP <- data.frame(State = names(mean_by_state_SLEEP), Mean_Sleep = unlist(mean_by_state_SLEEP))
mean_df_SLEEP["fips"] <- fips(mean_df_SLEEP$State)

# Map of the United States plotting the mean percent incidence of the population with poor sleep by state.
plot_usmap(data=mean_df_SLEEP, values="Mean_Sleep", labels = TRUE) +
  scale_fill_continuous(low = "white", high = "purple", guide = FALSE) +
  scale_x_continuous(expand = c(0, 0)) + scale_y_continuous(expand = c(0, 0)) +
  ggtitle("Mean Sleep by State") +
  guides(fill = guide_colorbar(title = "Mean %", 
                                title.position = "top", 
                                title.hjust = 0.5, 
                                label.position = "left",
                                label.hjust = 0.5))

# A list of summary statistic per state: Data are percent of population of people without leisure time in each country
summary_by_state_LPA <- lapply(data_by_state, function(x) summary(x$Data_Value.LPA))
# head(summary_by_state_LPA,3)
# tail(summary_by_state_LPA,3)

# A list of standard deviation per state: data are percent of population of people without leisure time in each country
sd_by_state_LPA <- lapply(data_by_state, function(x) sd(x$Data_Value.LPA))
# head(sd_by_state_LPA,3)
# tail(sd_by_state_LPA,3)

# A data frame about mean value per state: data are percent of population of people without leisure time in each country
mean_by_state_LPA <- lapply(data_by_state, function(x) mean(x$Data_Value.LPA))
mean_df_LPA <- data.frame(State = names(mean_by_state_LPA), Mean_LPA = unlist(mean_by_state_LPA))
mean_df_LPA["fips"] <- fips(mean_df_LPA$State)

# Map of the United States plotting the mean percent incidence of the population without leisure time by state.
plot_usmap(data=mean_df_LPA, values="Mean_LPA", labels = TRUE) +
  scale_fill_continuous(low = "white", high = "cyan", guide = FALSE) +
  scale_x_continuous(expand = c(0, 0)) + scale_y_continuous(expand = c(0, 0)) +
  ggtitle("Mean Lack of Physial Activity by State") +
  guides(fill = guide_colorbar(title = "Mean %", 
                                title.position = "top", 
                                title.hjust = 0.5, 
                                label.position = "left",
                                label.hjust = 0.5))

# create mean_df for all four conditions
mean_df <- merge(merge(mean_df_DEPRESSION, mean_df_MHLTH, by=c("State", "fips")), mean_df_SLEEP, by=c("State", "fips"))
mean_df <- merge(mean_df, mean_df_LPA, by=c("State", "fips"))
colnames(mean_df)[3:6] <- c("Depression", "MHLTH", "Sleep", "LPA")
# head(mean_df)

The Boxplot analysis

The boxplot analysis provides a comprehensive visualization of the distribution and spread of four health-related variables across U.S. states. Each point in the boxplot represents state-level health data, showcasing variations within and between states.

Key features of the boxplots include the largest and smallest values for each health variable: Depression is highest in West Virginia (WV) and lowest in Hawaii (HI); leisure-time physical inactivity (LPA) peaks in Kentucky (KY) and reaches a minimum in Utah (UT); poor mental health (MHLTH) is most prevalent in WV and least common in South Dakota (SD); and sleep deprivation is highest in HI and lowest in Minnesota (MN).

Among the health variables, sleep deprivation has the largest mean value, followed by LPA, depression, and MHLTH. This boxplot analysis highlights the disparities in health outcomes across states, which is essential for understanding regional differences and informing targeted public health interventions.

# create a long format of the data
mean_df_long <- gather(mean_df, key = "Variable", value = "Value", -State, -fips)

# group data by Variable and get the max and min values for each group
max_min_df <- mean_df_long %>% group_by(Variable) %>% 
  slice(which.max(Value), which.min(Value)) %>% ungroup()

## create a box plot for each variable and facet by variable
ggplot(mean_df_long, aes(x = "", y = Value, fill = Variable)) +
  geom_boxplot() +
  geom_jitter(aes(color = Variable), width = 0.2, size = 2) +
  facet_wrap(~Variable, ncol = 4, scales = "fixed") +
  scale_fill_manual(values = c("pink", "cyan", "lightgreen", rgb(200, 162, 200, maxColorValue = 255))) +
  scale_color_manual(values = c("darkred", "darkblue", "darkgreen", "darkorchid")) +
  labs(title = "Health Condition Distributions", x="Health Conditions", y = "% of State Pop. with Condition") +
  geom_text(data = max_min_df, aes(x = 1.25, y = Value, label = State), size = 4, fontface = "bold", hjust = -0.2, color = "black") # add state labels for max and min values

This code will create a table showing the mean, median, and standard deviation for each variable (depression, sleep, and mental health) across all states, grouped by state names or FIPS codes. You can compare the results to see how each variable varies across different states.

c. Correlations

Explore the relationships between variables and identify any correlations that may exist. This can help to identify potential predictors for modeling.

# Compute the correlation matrix
# create national correlation matrix
cor_matrix <- cor(mean_df[ , c("Depression", "MHLTH", "Sleep", "LPA")])

# create national correlation heat map
corrplot(cor_matrix, method = "color")

# create mixed national correlation heat map
mixed_cor_heat_map <- corrplot.mixed(cor_matrix, 
                                     main = "Correlation Between Health Conditions (National)",
                                     mar = c(0,0,2,0))

mixed_cor_heat_map

# Create a data frame containing state, FIPs code, cor_MHLTH_sleep, and cor_MHLTH_LPA
cor_by_state_matrix <- lapply(data_by_state, function(state) cor(state[c("Data_Value.DEPRESSION", "Data_Value.MHLTH", "Data_Value.SLEEP", "Data_Value.LPA")]))
cor_by_state_df <- data.frame(
    state = names(cor_by_state_matrix),
    MHLTH_SLEEP = sapply(cor_by_state_matrix, function(state) state[2,3]),
    MHLTH_LPA = sapply(cor_by_state_matrix, function(state) state[2,4])
)
cor_by_state_df["fips"] <- fips(cor_by_state_df$state)

Scatter plot for 2 health risk behavior and a health outcomes ,and Health Status

# create scatter plots
# sleep scatter plot
ggplot(mean_df, aes(x = Sleep)) +
  geom_point(aes(y = Depression, color = "Depression")) +
  geom_point(aes(y = MHLTH, color = "MHLTH")) +
  scale_color_manual(name = "Health Outcomes", values = c("Depression" = "red", "MHLTH" = "green")) +
  labs(x = "% Lacking Sleep", 
       y = "% With Health Outcomes",
       title = "Correlation Between Lack of Sleep and Health Outcomes") +
  geom_smooth(aes(y = Depression, color='black'), method = "lm", se = TRUE) +
  geom_smooth(aes(y = MHLTH, color='black'), method = "lm", se = TRUE)

# lpa scatter plot
ggplot(mean_df, aes(x = LPA)) +
  geom_point(aes(y = Depression, color = "Depression")) +
  geom_point(aes(y = MHLTH, color = "MHLTH")) +
  scale_color_manual(name = "Health Outcomes", values = c("Depression" = "red", "MHLTH" = "green")) +
  labs(x = "% Lacking Physical Activity", 
       y = "% With Health Outcomes",
       title = "Correlation Between Lack of Physical Activity and Health Outcomes") +
  geom_smooth(aes(y = Depression, color='black'), method = "lm", se = TRUE) +
  geom_smooth(aes(y = MHLTH, color='black'), method = "lm", se = TRUE)

Correlation between MHLTH and sleep, MHLTH and LPA using us map.

# The map of US about correlation between MHLTH and Sleep.
plot_usmap(data=cor_by_state_df, values="MHLTH_SLEEP", labels = TRUE) +
  scale_fill_continuous(low = "white", high = "blue", guide = FALSE) +
  scale_x_continuous(expand = c(0, 0)) + scale_y_continuous(expand = c(0, 0)) +
  ggtitle("Correlation Betweeen Mental Health and SLEEP") +
  guides(fill = guide_colorbar(title = "Correlation",
                                title.position = "top",
                                title.hjust = 0.5,
                                label.position = "left",
                                label.hjust = 0.5))

# The map of US about correlation between MHLTH and LPA
plot_usmap(data=cor_by_state_df, values="MHLTH_LPA", labels = TRUE) +
  scale_fill_continuous(low = "white", high = "blue", guide = FALSE) +
  scale_x_continuous(expand = c(0, 0)) + scale_y_continuous(expand = c(0, 0)) +
  ggtitle("Correlation Betweeen Mental Health and Sleep") +
  guides(fill = guide_colorbar(title = "Correlation",
                                title.position = "top",
                                title.hjust = 0.5,
                                label.position = "left",
                                label.hjust = 0.5))

d. Hypothesis testing

If applicable, conduct statistical hypothesis tests to confirm or reject hypotheses about the data.

Linear Regression Test

# multiple linear regressions

depression.mlg <- lm(Depression ~ Sleep + LPA -1, data = mean_df)
mhlth.mlg <- lm(MHLTH ~ Sleep + LPA -1, data = mean_df)


# mlg tables

depression.mlg.table <- stargazer(depression.mlg,
                                  type = "text",
                                  title = "Multiple Linear Regression for Depression, Sleep, and LPA",
                                  header = FALSE,
                                  digits = 4,
                                  star.cutoffs = c(0.05, 0.01, 0.001),
                                  report = "vcstp*")

mhlth.mlg.table <- stargazer(mhlth.mlg,
                                  type = "text",
                                  title = "Multiple Linear Regression for Mental Health, Sleep, and LPA",
                                  header = FALSE,
                                  digits = 4,
                                  star.cutoffs = c(0.05, 0.01, 0.001),
                                  report = "vcstp*")


# # mlg partial regression plots
# mhlth.lpa.mlg.partial.reg.plot <- qqp(mhlth.mlg, "LPA")
# 
# mhlth.sleep.mlg.partial.reg.plot <- qqp(mhlth.mlg, "Sleep")
# 
# # mlg coefficient plots
# dep.mlg.coefplot <- coefplot(depression.mlg, exclude = TRUE, title = "Depression")
# dep.mlg.coefplot
# 
# mhlth.mlg.coefplot <-coefplot(mhlth.mlg, exclude = TRUE, title = "MHlTH")
# mhlth.mlg.coefplot
# 
# # mlg residual plots
# dep.residual.plot <- plot(depression.mlg, which = 1)
# dep.residual.plot
# 
# mhlth.residual.plot <- plot(mhlth.mlg, which = 1)
# mhlth.residual.plot

We failed to reject depression and there is a significant test for mental health . that means mental health is effected by sleep and LPA.

3. Result

Add paragraphs here

4. Discussion

a. Limitations

This research has several limitations that should be considered. First, the cross-sectional nature of the data prevents the establishment of causal relationships between lifestyle factors and mental health outcomes, as only associations can be inferred. Second, the research is limited to the variables available in the dataset, potentially omitting other relevant factors, such as nutrition, social support, or access to healthcare, which could influence the observed relationships.

Moreover, the study relies on self-reported data for some variables, which may be subject to recall bias or social desirability bias, potentially affecting the accuracy of the results. Additionally, the analysis does not take into account potential confounding variables or interactions between variables that could influence the associations between lifestyle factors and mental health outcomes.

b. Future Enforcement

This study suggests that lifestyle factors such as sleep and physical activity may have an impact on mental health outcomes. Further research suggestions include expanding the scope of the analysis to incorporate additional variables such as nutritional habits, social support, access to healthcare, and socioeconomic factors. Longitudinal and time-series data could also be employed to investigate causal relationships between lifestyle factors and mental health outcomes, as well as the impact of the COVID-19 pandemic on these relationships. Examining potential confounding variables and interactions between variables, implementing advanced statistical techniques, and conducting qualitative research could provide deeper insights into the complex relationships between lifestyle factors and mental health outcomes. By incorporating these suggestions, future research could contribute to a better understanding of the impact of lifestyle factors on mental health, particularly in the context of the COVID-19 pandemic.

5. Conclusion

Add paragraphs here

6. References

Add references here